Performance Evaluation and Analysis for Conjugate Gradient Solver on Heterogeneous (Multi-GPUs/Multi-CPUs) platforms
نویسندگان
چکیده
High performance computing (HPC) presents a technology that allows solving high intensive problems in a reasonable period of time, and can offer many advantages for large applications in various fields of science and industry. Current multi-core processors, especially graphic processing units (GPUs), have quickly evolved to become efficient accelerators for data parallel computing. They can maintain parallel programmability and provide high computing throughput. In this paper, the authors present an implementation and performance analysis of sparse iterative linear solver on heterogeneous multi-CPUs/multi-GPUs architectures using PARALUTION and StarPU libraries. More particularly, the authors compare the performance of parallel preconditioned conjugate gradient (PCG) solver on different platforms. Experimental results have been conducted using GPU platforms and show a significant speed up compared to central processing units CPUs implementations. In order to provide the highest performance, the system supports Multi-CPU/Multi-GPU architectures, where it scales up very high.
منابع مشابه
From MPI to MPI+OpenACC: Conversion of a legacy FORTRAN PCG solver for the spherical Laplace equation
A real-world example of adding OpenACC to a legacy MPI FORTRAN Preconditioned Conjugate Gradient code is described, and timing results for multi-node multi-GPU runs are shown. The code is used to obtain three-dimensional spherical solutions to the Laplace equation. Its application is finding potential field solutions of the solar corona, a useful tool in space weather modeling. We highlight key...
متن کاملReconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse - Grid Solution
Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to CPUs or even GPUs, adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability ...
متن کاملDesign and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms Amani AlOnazi The progress of high performance computing platforms is dramatic, and most of the simulations carried out on these platforms result in improvements on one level, yet expose shortcomings of current CFD packages. Therefore, hardware-aware design and optimizations are crucial ...
متن کاملA flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters
Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing...
متن کاملLocality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures
Most recent HPC platforms have heterogeneous nodes composed of a combination of multi-core CPUs and accelerators, like GPUs. Scheduling on such architectures relies on a static partitioning and cost model. In this paper, we present a locality-aware work stealing scheduler for multi-CPU and multi-GPU architectures, which relies on the XKaapi runtime system. We show performance results on two den...
متن کامل